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Abstract 

In a recent serrinal paper , G bson and Wxl er ([1], GV^C t ake i rrpor t ant s t eps to for rral i zi ng t he not i on of 
1 anguage 1 ear ni ng in a (fini t e) space whos e gr amrar s are char acterizedbya fini t e nunber of parameters. 
One of t hei r ai ns is to char act er i ze t he corrpl exi t y of 1 ear ni ng i n such spaces . For exarrpl e , t hey derron- 
s t r at e t hat even i n fini t e spaces , convergence rray be a pr obi emsi nee it is possi bl e under some single-stc 
gradient ascent methods to remain at a local rraxi rrum Fromthe standpoint of learning theory, how- 
e ver , GW1 eave open s ever al quest i ons t hat can be addressed by a more pr eci se for rral i zat i on i n t er ns of 
Mir kov structures (a possi bl e for rral i zat i on suggest ed but left unpur sued in a foot not e of GMf. In t hi s 
paper we expl i ci 1 1 y for rral i ze 1 ear ni ng in a fini t e par arret er space as a Mr kov s t r uct ur e whose s t at es ar 
parameter settings. Several important results that f ol 1 owdi r ect 1 y f r omt hi s characterization, include 
cor r ect ed ver si on of GWs cent r al convergence pr oof ; (2) an expl i ci t for mil a for cal cul at i ng t he t r ans i t 
pr obabi lities between hypot heses and t he exi s t ence of "pr obi emstates" in addi t i on to 1 ocal maxi ma; (3 
an expl i ci t cal cul at i on of t he t i me needed t o converge , i n t er ns of number of (posi t i ve) exampl es ; ( ■ 
t he convergence and corrpar i son of sever al var i ant s of t he GW1 ear ni ng procedure , e.g., r andomwal k; (5) 
bat ch- and PAC s t yl e 1 ear ni ng bounds for t he model . 
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1 Introduction: The Tci ggeri ng Model exanple sentence, at ti me z t (exanpl es drawn 

as a Mrkov structure from the language of a single target gramrar, 

L(G t )), from a uni f or m di s t r i but i on on the 1 an- 
Recently, Q bson and Wxl er ( [ 1] , GV^ have begun t o guage ( we shal 1 be abl e t o r el ax t hi s di s t r i but i onal 
forrralize the notion of language learning in a ( fini t eq)ons t r ai nt later on); 
space whose gramrars (and languages) are character- 

izedbyafimte nunber of parameters or 1- di mens i onal* [ Lear nabi 1 1 1 y on er r or detection] Step 3 If the cur - 
Boole an- valued arrays, nlong. A grammr i n t hi s space r ent gr amrar par ses ( gener at e£ ) ben go t o St ep 
is si rrply a particular n- 1 engt h ar r ay of 0' s and 1 ' s ; hen^te ot her ™ se ' continue. 

there ar e n 2possi bl e gramrars (1 anguages). Che of G b- . [ Si ngl e- s t ep gr adi ent- as cent ] Select a si ngle par am 
son and Wxler' s aim is t o es t abl i sh t hat under some e ter at randorn unif orrrly with probabi li ty 1/n, 
simple hill- clirrbi ngl earning regimes, namely, si ngl e- at^fli p f r omi t s current setting, and change it (0 
gradient ascent, son linguisti cally natural , finite, spaspged to 1, 1 to 0) iff that change allows the cur- 
are unlearnable, i n the sense t hat posi t i ve- onl y exanpl ?% n t sentence to be analyzed; otherwise go to Step 
lead to local rmxmu — incorrect hypotheses fromwhich 2' 
a 1 ear ner can never es cape . M>r e br oadl y, theywishto 

showthat learnability in such spaces is still an iOte^cmrse, this al gori t hm never halts i n t he usual 
esting problem i n t hat there is a substantive le&fSlfig GWaimto show under what conditions this al - 
theory concerning feasibility, convergence t i me , £M k IP converges "in the li nit "-that is, after some 
like, that mist be addressed beyond t r adi t i onal liftpfrSE, n, of steps, where n is unknown, the correct 
tic theory and that night even choose bet ween ot her wi¥eg et parameter settings will be selected and never be 
ade quat e 1 i ngui stictheories. c hange d. The i r c e nt r al c 1 ai mi s s t at e d as t he i r The or e m 

I n t hi s paper , we choose as a conveni ent s t ar t i ng podlh-t 7 l n t hei r nanus cr l pt ) . 

t hei r Tr i gger i ng Lear ni ng Al gori t hm( TLA) to focus .our , , , , , ,.,.,. , 

, ■ r j. l • /~i i Ineorem 1 As I onq as the probabi 1 i tu is always qr eater 

l nves 1 1 gat l on ot parameter learning. Qir central r.esmt, , , , /, n \.i , ,i . . , , , , 

. i . x i o r i i • i • i i • ifhan a I ower bound b b > (J It hat t he I earner wil I 1 en- 

is that the perlorrrance ol this algorithmis completely ,,.,■• , 

,,,, ,, , ,• m • i j-ii count er a I ocal triqqer for sorre i ncorrect I y- set par arret er 

model ed by a IVar kov chai n. Ihe r emu nder ol t he cur„ , . , , , . r, , • , , , , . , , 

., , i , i • iii- r . and 2 t hen reset F accordi nqly to the tarqet value, it 

rent paper l s devot ed t o expl or l ng t he basi c consequences ,,,,,,, , , , , , 

o , i • o , turns out that the t arqet qrammr can al ways be I earned 

ol thi s 1 act . ■ , , m ■ r ■ a, ■ , 7 

T , n , , i pt,, j , j , i rpr . ^ , usmq the lriqqerinq Learning M gorithm 

Let us hrst review the (jWmodel and t he 1LA iol - a aa a a a 

1 owi ng Gbl d T 21 t he basi c f r amewor k i s t hat of i dexiti fi- m , „ , r , , . 

, • • , i i • • , m i /iii\ j. j. 4-T- Jte Markov iorrmlation 

cation in the limt. Ihe 1 e ar ne r ( c hi 1 d) starts out l n an 

arbitrary state= some setting of the n par amet er fednat he standpoint of learning theory, however, GW 
ues. The learner (child) receives a (count ably i lifMtee^pen several questions that can be addressed by 
sequence of posi ti ve exanpl e sentences dr awn f r omsaiWDr e preci se f or mil i zati on of t hi s model i n terns of 
target 1 anguage^. IM ter each present at i on, the 1 earMttrkov chai ns (a possi bl e f or mil i zati on suggested but 
canei ther (i ) stay i n the same state; or (i i ) move t d afrfe wmpur s ue d i n footnote 9 of GVty. W can pi cture 
hypothesi s state, usi ng t he al gori t hmgi ven bel ow.t M kfpot hesi s space , of i?i, zas2a set of poi nt s , each 
ter some fini te nunber of exanpl es the 1 earner convercgffsr espondi ng t o one parti cul ar array of parameter set- 
to the correct target language (=pararreter settitigi^^ (languages, gramrars). Cal 1 eachpoint ahypothe- 
and never changes st ate, then i t has cor recti y i derrt'isfte>'(M e or si npl y st at e of t hi s space. A i s convent i onal , 
the target 1 anguage ; otherwi se, it does not converge .define these 1 anguages over some al phabet S as a sub- 
In addition, i n the GWmodel t he 1 anguage 1 ear r»ft of E Che of t hemi s t he t ar get 1 anguage (gr amrar ) . 
obeys t wo f undame nt al constraints: (1) the si ngl e-vW «wbi t r ar i 1 y pi ace t he (single) target gramrar at the 
constraint— the learner can change only 1 par amet ecnt er of this space. Since by t he TLAthe learner is re- 
value at a time; and (2) t he greedi ness const rat nt -4tfr,i ct ed t o movi ng at most 1 bi nary val ue i n a si ngl e st ep, 
the 1 earner is gi ven a posi ti ve exanpl e i t cannot iheotgheor et i cal 1 y possi bl e transi tions betweenstates ca 
ni ze (accept ) , and if the 1 earner changes one par ametdaawn as (di rected) 1 i nes connect i ng parameter arrays 
value and finds that it can accept the exanple, t hen(tftj£pot heses ) that differ by at most 1 binary digit (a 
learner retains that new parameter value. Fi nall<yr ael i n some cor r espondi ng posi t i on i n t hei r arrays), 
al so recal 1 GWs defini ti on of a I ocal trigger(m nor Bextall that t hi s is the so- cal 1 ed Harming distance. 
t i onal changes asi de) : gi ven val ues for al 1 par amet ersTfc nay further pi ace weights on the transi t i ons from 
one , a / ocal tri gger for val ue v of par ame,tg(s )$ i s s t at e i t o s t at e j cor r espondi ng to t he nonzero b ' s men- 
asentence sfromthe target gr armafr sCflch t hat s is tioned in the t heor em above ; these correspond to the 
grarmatical iff^w) =v . GWt hen s t at e t hei r TLA as probabilities that the learner will move fromhypothe- 
f ol 1 ows : sis state i to state j . Infact, as we shal 1 show bel ow, 

r T • , • , • -, a , -, r,, , , , • gi ve n a di s t r i but i on over L( G] , we can fur t her carry out 

t Initialize Step 1. St art at some r andompoiTit in. 1; . r;1 ; ,,, ; , , m 

t , /n •; \ r -ii . , ,t-ne cal cul at l on ol the actual b s t hens elves. Ihus , we 

the (hnite) space ol possible parameter settings, ' 

s pe c i f yi ng a s i ngl e hypot he s i z e d gr amrar wi t hits : ~ •<•«•»! 

, , • , • , Me that the notionoi trigger does not enter into the 

r esul 1 1 ng ext ensi on as a 1 anguage ; , , , r ,, „ . ,, f . , ,, „ . , 

° o o i statemnt ot the ILAor the constraints the itAerploys, 

• [Process input sentence] Step 2. Receive a ppsliut ioaty into the statemnt of the theorem 



can picture the TLA 1 ear ni ng space as a di r ect ed,THear-ef or e Cis not learnable, a cont r adi ct i on. I n t he 
be 1 e d gr aph V wi t If 2ve n t i c e% Mn e precisely, we can second case, wi t hout 1 os s of ge ne n al i t y, ass um t he r e are 
make t he f ol 1 owi ng r errar ks about t he TLAsys t emGWexact ly tw absor bi ng s t at es , t he fir s t S cor r espondi ng 
describe. t o t he t ar get par arret er set t i ng, and t he seconded 

Rermrk. The TLAsys t emi s mrwryl ess , t hat i s , g i ve fiPondi ng t o som ot her set t i ng. By t he defini t i on of an 
a sequence s of sentences up to tj,mtlfe sel ect i on absor bl n 8 state, in the limt C wi 1 1 with som nonzero 
of hypothesis h depends only on s e nt e,nc and not probability ente^r Sad never exit'.5"Ihen C l s not 
( di r ect 1 v) on pr evi ous sent ences i e learnable, a cont r adi ct i on. Hence our assurrpt i on t hat 

t here i s not exact 1 y 1 AS mist be f al se . 

=>•. A sum that there exists exactly 1 AS i i n t he 
P {/*(«) <Xi\x(t), t <£±i} =P{x(ti) <Xi\ x(i_i ) } Mrkovchai n M Then, by the defini ti on of an absorbi ng 

T . i liirwA j. ■ i -l s,t at e , after som nunber of steps n, no natter what the 

In ot hen wor ds , t lie 1LA systemis a classical ms- ; ! ; ; , . . , , , . ' . ,. 

, , . • , . , ,. , s t ar,t l ng s t at e , mmII end up in state i , cones pondi ng 

crete st ochasti c process , l n pant l cul an , a di s cnet e Markov . , . 

a* i i, • w -n, +i, to the t anget gnamrnn . I 

process on IVankov chain. W can nowuse the theony of AT ; ; . ° ; .. . . . . . n . ; . 

, t , , • , , • , m a j. r on -n Not e t hat t hi s appnoach avoi ds a cnuci al haw l n t he 

IVankov chai ns t o des cm be ILApan amt en spaces \6 . ion c . . ^, I7/ „ „ • • , N 

, • , , , , , , , nn opt gi ve n l n bW pp . 7- 8 l n nanus c n l pt : 

example, as is well known, we can convent the gnaphfcal ° v ' 

nepnesent at i on of an n- di mnsi onal Minkov chai n M t o That is, if the 1 eannen neven goes t hnough 

an n x n mat n i x T, whe n e each nat n i x e nt n y ( i , j ) n e p- t he s am state twice, thensheis bound t o e nd 

nesents the tn ans i t i on pn obabi lityfnomstate i tost afiei nthe tan get state at som point, bee aus e 

j . A si ngl e step of the Minkov pnocess is computed vi a t he pan amt en space i s fini te i n si ze. Thus the 

t he nat ni x mil t i pi i cat i on T xT; n s t eps is gi fen by T pn obabi 1 i t y of avoi di ng t he t an get s t at e f on- 

A "1" ent ny i n any cell ( i , j ) means that the syst emwi 1 leven is equi val ent t o t he pn obabi 1 i ty of cycl i ng 

c onve ngewithpn obabi lityltost ate j, givent hat it stafbseven thnoughsome ondenedset of states (a 

i n st at e i . cycl e) . 

As mnti oned, not al 1 these tnansi ti ons wi 1 1 be pos- W can di vi de the pan amt en space i nt o a 

si bl e i n genenal . Fon exanpl e, by the si ngl e val ue hyfini t e set of ni ni nal cycl es , whene each ni n- 

pothesis, the systemcan only move 1 Hamming bit at i mal cycle contains no cycles as a subpant. 

atim. A so, by assunpt i on, onl y di ffen ences i n sunfac (Because t he pan amt en space i s fini t e , t he set 

stnings c an f once the 1 eannen fnomone hypothesis state b& ni ni nal cycles in the pan amt en space is 

anothen . Fon i nst ance, if state i con nes ponds to a gnaual so fini te. Fon each ni ni nal cycl e, we can 

man that genenates a 1 anguage t hat is apnopen subset nowcal cul at e t he pn obabi 1 i t y t hat t he 1 eannen 

of anot he n gn airman hypot he s i s j , t he n e c an ne ve n be a n e nai ns i n t hat cycle foneven.. . the pn obabi 1 - 

t n ans ition (nonzeno b) fnomj to i , and t he n e mis t bei t y of s t ayi ng i n t he [ ni ni nal pn/n c b] cycle 

one fromi to j . Fun then , by assunpti on and the TLA inthelinit (foneven) is zeno. The s am is tnue 

it is cl ean that once we n each the t an get gn airman t henef on al 1 of the fini t el y- nany ni ni nal cycl es , so 

is nothing that can move the 1 eannen fnomthis s t at e ,t hat t he pnobabi 1 i t y of s t ayi ng i n any of t hese 

si nee al 1 nenai ni ng posi ti ve evi dence wi 1 1 not cause thycl es i n t he 1 i ni t is al so zeno. Thus the prob- 

leannen to change its hypothesis. Thus , thene mist be stability of ending up at the target state inthe 

1 oop f nomt he t an get state to itself, with som posi t i Iffcrri t i s one. 

1 abel 'b and no exi t an cs . I n t he Minkov chai n 1 i t en at un T e , , . „ „,„ . . . . , . , . . , , , . , . . „ 

, , • • , ,, , • r,, , i A rn ,-j • , 1 ri bm et , CjWat t enpt t o showt hat t he pnobabi 1 1 1 y ot 

t hi s is known as an Absorbing St at e AS . (Jbvi ousl y, ,a . ' • i- , i , , <• • i i • 

. . . i . i i j . ac • i i i j • n i the learner avoi di ng t he t an ge t t on e ve n is zeno by s ho wi ng 

s t at e t hat onl y 1 eads t o an AS wi 1 1 al so dm ve t he 1 eannen. . , , , , • • , , • n • , , 

j. j. i j. ao -n- i i -j- j. j. ii that the tact that som m ni nal cycle occuns infinitely 

tot hat AS. ii nal 1 y, it a s t at e con nes ponds t o a gr.am . ; . . . . . . ; . ; . . „ . ; J 

, i , , , r j. i j. , .oiten makes t he pnobabi 1 1 1 y ot t he l nhm te sequence zeno. 

nan that genenates som sentences ot the tanget tJiene ; . , r . J , . , ;1 . ^ . . 

, , r , , , • , ,r j. l i l Inothenwondseveny way l n whi chthe leannen avoi ds 

l s al ways a 1 oop t nomany state to itseli , t hat has som •; . . J ; 

,,.,., r , , , , , t he t an ge t has pnobabi 1 1 ty zeno. Ihus t he y c one 1 ude t hat 

nonzeno pnobabi 1 1 1 y. (Jeanly, one can conclude at once. .° . ; r 1 J . ; J J 

, , o i i • i i • i • , i , pnobabi 1 1 1 y ot t he event 

t he t ol 1 owi ng 1 eannabi 1 1 1 y nesul t : L J 

Hieorem2 Given a Mrkov chain C corresponding to Event =Leannen avoids tanget foneven 

a GW TLA I earner , 3 exactly 1 AS ( corresponding to ■ , , , , • 

, ' \ ■ rr ^ ■ , 77 l s zero, nwe precisely, tney claim 

the target gramrur/l anguage) ijjC is I earnabl e. 

Proof. -<=. By as sumpt i on, Cis 1 eannabi e. Nowassum /^/"[UKfaJ — U 

fon sake of cont n adi ct i on t hat thene is not exact^pne^^ w { g a pat h avoi di ng t he t anget an^ IW 

AS. Then thene mist be eithen AS on >1 AS. In t hq g get of al , guch pat hg mvevei t as is well knowi, this 

finst case, by the definition of an absorbing s t at e , ^h^e^^^ at { on { g true iffit ig takenover a countable 

is no hypoth esis in which the leannen will r eimi n f cflfljgj,. of e i emnts . I n t he exanpl e at hand, the cnuci al 

2^t,t T~ i -j„i- i j- ■ i- j: • -n. j omission in the angumnt is that the thene ane an un- 

bW construct an l denti cal transitioncnagramin tne de- ° 

scriptionof their corpiter p-ograrrf or calculating local mx- count ab I e rmrher of ways mwhichthe leannen can avoi d 

ira. Swer, this diagramis not e^qlicitly presented as a the tanget. Tai s is because thene ane an uncountable 

]\arkiv structure; it does not include transitionprotatilities. nunben of sequences of nunbens between 1 and M— 1. 

Of course, topol ogj cal ly both structures imst be identical . The base M—\ expansion of any real nunber in the 



i nt er val [0, 1) woul d yi el d such a sequence (e.g. , conSiiplpDse SOV(set t i ng #5=[ 010]) i s t he t ar get gr am 
an i r r at i onal expansi on such as t he s quar e root of ffl^r ( 1 anguage) . Wt h t he GW3- par arret er sys t ern 

Since there are an uncountable nurrber of ways there are 3 2=8 possible hypotheses, so we can draw 
which the event of avoiding the target forever caUibs as an 8- poi nt Mir kov configur at i on space , as shown 
real i zed, the fact that each such way has probabi 1 i tiynzteh© figur e above. The shaded ri ngs represent i ncreas- 
does not i rrpl y that the total event has probabi 1 i t yizng- cHarmi ng di stances from the target . Each 1 abel ed 
as wel 1 . To see t hi s consi der a randomvari abl e V wi t hi e is a Mir kov st ate, a possi bl e array of parameter 
a uni f ormdi stri but i on on [ 0, 1] . Nowconsi der t he e?iii ngs or gr amrar , hence ext ensi onal 1 y speci fies a pos- 
sible target language. Each state is exactly 1 binary 
Event : X < 1/2 di gi t away f r omi t s possi bl e t r ansi t i on nei ghbor s . Each 

rr, . ,-,;,- ; ,, directedarcbetweenthe poi nt s is a pos si bl e ( nonzero) 

Ihe r e ar e rrany ways l n win cntms event couldoccur e.g ... „ , , ., , , . , n n , v , , ' 

, , , ,, 1 / o -.^ n « o . , t-i i r- , i t rans ltion tromstate i to state i ; wes nal 1 s no w no w t o 

X=l/4, X=l/3, X = 0. 234 etc. Each of these ways x xl . . ,• A , , , At ±i ± ± i 

I ' I ' ^ r> ATmn t a t n l o l mr» Hi otc> it ho i-itxt \/ki o o o n tttd Thai t n a 



n , TT , L , , i , , , , .' , t-ar get gr amrar , a doubl e circle, lies at the c 

. . . and s o on. .However we know t nat t lie probabi 1 it y° ° , . ' . , , „ , . ,, „I n „ 

,,„.,,„ ; m . . , corresponds t o t lie Engl l sn SOV 1 anguage . S 

1/2 l s 1/2 not zero, ini s l s because , , , , , , . \ , „ , , 



nput e t hi s i mre di at e 1 y be 1 o w. W as s urn t hat t he 

ur round- 



has probabi lityzeroi.e.,_P[X = l/4l=0, P\ X = l / vl — . , ,, . , ,. . . , . m . 

L ' J L ' f ^' " ai " r -irrrrar adouble circle, 1 1 es at the center. Ini s 



Ol TJ lie 6 V 6 llTJ y\. \ _L / Zj IS _!_ / ^ j.j.w u n >_. ±. w . _i_±j.j. o j. o u\^ >^ ct«u.o ^jiiii > > jioji j 

, , ,77] r ■ j • i i 1 1 ng t lie bul Is-eye target are the 3 other par am t e r ar r ays 

t her e are an uncount abl e nurrber ol ways 1 n win ch t he. ° ; .. _. . minii 1 • i- • , 1 • 

, v 1 / n iiii i m j. i j- • t hat di iter t r om 10 by one bi nary di gi t each; we pi c- 

event X < 1/ 2 coul d t ake pi ace . Ihus t he proof as gi yen ; , !■ , . J . , . ; c , i ; ; 

r -. n • • . „ . . r , .t ur e t hese as a r l ng 1 rlanm ng bi t away t r omt he t ar get : 

in 1 is incorrect. Qie correct way to 1 or mil ane, the n i- , mm i. 

n • , , • , , • • , , r , r i ;0-, 1, 1, correspondingto bW s par am ter setting 

proof is by resort l ng t o an expl l ci t IVar kov 1 or mil at l on ' . J ' Q /„ „ P „ „ , A _ , . 



. 11 . . . i • /-hot r j. j. r, in, their figure 3 (Spec-first, Chirp- final , +V2, basic or- 

as suggested but not executed in bWs footnote 9, and „,„ ,„°, r „;„, ' f. ' ^_„ ' ; ; . 

, i i • i i i a • • i iii- Vcder , SvQfV2) ; 0, correspondi ng t o GWs setti ng 

as we est abl l shed above . Asi m 1 ar concept ual di thqul i-v n \ n n i. itAn i • i nmr ? 

,, i , ,i • r -i j. iiiiii IP (ppec-hrst, Corrp-hrst, — V2), basic order SUV; and 

seem ngl y 1 eads t o t hei r t ai 1 ur e t o not e t hat t her e fray; be, „,_. ... ,„ , „ k i ^ n i T /m 

, , ° , , ■ , i i • r j-i 1 I 1 , GWs setti ng M ( Spec-hnal , Corrp- hnal , -V2\ , 

ot her s t at es best des 1 ocal naxi na, t or win ch convergence J , , Tr ^ ° v J 

TDasi c or der VCJb. 
nay not occur. . j;u-- • i-o ± a- u 

Around this inner ring lie 3 parameter setting hy- 

Corollary 1 Gven a Mrkov chain corresponding t o a potheses, all 2 bi nar y di gi t s awayfromthe target: [0 

(finite) famly of gr amrar s in a GWl earning system if 1] , [100], and [111] (gr amrar s ffi, 3, and 8 i n GW 

there exist 2 or mre AS, then that famly is not /earfigure 3) . Note that by the Si ngl e Val ue hypothesi s that 

abl e. t he 1 ear ner can onl y move one gr ey r i ng t owar ds or away 

f r omt he t ar get at any one step. Fi nal 1 y, one more r i ng 

hjxanpl e. ou ^. ^ t hr ee bi nary di gi t s di fie rent f r omt he t ar get , is t he 

Consider the GW3- par amet er system Its binary Pffv pot hesis [10 1], corresponding to target gramrar 4. 

rameters are: (1) Spec(ifier) first (0) or last (l); It (2), easy to see f r om i nspect i on of the figure that 

Cbnp(lement) first (0) or last (1); and Verb Second ( t Vg^ re are exact , y 2 absorbi ng st at es i n t hi s Mrkovchain, 

does not exist (0) or does exist (1). By Specifier GWf^ { s ; sta t es that have no exit arcs. Che AS i s t he 

lowthe standard linguistic convention of whether |h f p t graramr (by defini t i on) . The other AS i s state 2. 

is part of a phrase that "specifies" that phrase, ropg^jy,^ gtate 4 ig algo agink ( aso _ ca n ed "closed state 

like the oldmthe ol d book; by Conpl emnt GWr oughl }j n t he Mr kov t er ni nol ogy) that leads only to state 4 or 

man a phrase's arguments, like an t ce- creanan John <$fa te 2 . These two st at es cor respond t o t he 1 ocal rraxi im 

an i ce- creamoi with envy in green with envy . There arg t t he head of GWs figure 4. Hence this systemis not 

also 7 possible "wor ds" i n t hi s 1 anguage : S, V, q^,, na ble. I n addi t i on t o t hese 1 ocal rraxi im, the next 

CE, Adv, andAux, cor respondi ng t o Subject , Vrb, C^- ect j on bel ow ghows t hat there are i n f act other states 

ject, Direct Cbject, Indirect Cbject, Adverb, aiyL^; ch t he , ear ner can never r each t he t ar get 

jective. There are 12 possible surface strings for each 

(-V2) gramrar and 18 possible surface strings for 2 acl Deri vat i on of Transition Probabilities 

( +V2) gr amrar if we restrict our s el ves to unerrbedded ~ ,, -. - , rf j . „, , 

or "degree- 0" exanples for reasons of psychological pi sfvP r the Mrkov -LLAJit ruct ure 

si bi 1 i ty (see GWf or di scussi on) . Note that the "silhiaxtfftpiit at i on of the transi ti on probabi 1 i ti es f romthe 

strings" of these languages are actually phrases s f %}hg§ age f airily can be conputed by a direct extension 

Subject, Verb, and Cbject. Figure (3) of GWsumia f the procedure giveninGWLet the target language 

rizes the possible bi nar y par amet er settings in t hj^ c SaB?ri s t of the stripg^,s. . . , i.e., 

tern For instance, parameter setting (5) corresponds to 

the array [010] =Specifier first, Cbnp last, and -V2, L% =i s i> $,*,-••} 

whi ch works out t o t he possi bl e basi c Engl i sh surtfeaxteher e be a probabi 1 i t y di s t r i but ionPont hese stri ngs . 

phrase order of Subj ect-Ver b-Obj ect ( SVO) . As shoiSuppose the learner is in a state corresponding t o t he 

i n GWs figure (3), the other possible ar r ange mentis anufguage J. Suppose it nowreceives the stjingts 

surface stri ngs cor respondi ng to t hi s par amet er swtlth nig) so wi t h probabi 1 i t j)_P(Ther e are two cases to 

i ncl ude SV; SV Ol CB ( t wo obj ects, as in give John oaixairi ne dependi ng upon whet her or not t he s t ryiing s 

t ce- cream) ; S Aux V ( as in John is a nt ce guy; S Aux Vanal yzabl e by t he gr amrar correspondi ng t o t he current 

Q S Aux VOl CB; Adv S V ( where Adv i s an Adverb, par amet er set t i ng. 

1 i ke qui ckl y; Adv S VQ Alv S VOl 02; Adv S Aux V; Chse I. Suppose t he 1 ear ner can synt act i cal 1 y anal yze 

Adv S Aux VQ and Adv S Aux VOl 02. thereceivedst r i^ig By t he TLA, i t wi 1 1 not change i t s 



par arret er val ues . I n t he Mir kov chai n for mil at i on, cfakenowbe gi veil as , 

1 e ar ne r re riai ns i n t he s am state. Re m mbe r t hat t hi s ^-^ 

s t at e corresponds to t he 1 anguage j4so not e t hat "L s ~^ s J — / y "L s ~^\ 

t hi s sit uat ion arises only whesniai t he 1 anguage. L k i s a ne i ghbor i ng s t at e of « 

Therefore t he pr obabi 1 i t y of t he 1 ear ner r eriai ni ng i nT-t he , , • , • , i 

. . • D x ii nal 1 y, given any parameter space mtn n parame- 

J" . ters, we have n 21 anguage s . Fixing one of the mas the 

Chsell. Suppose the learner cannot syrit act i cal 1 y t ap- t 1 anguagf| fe obt al n t he fol 1 owi ng procedure for 
a yze the string. ThrfiL,. By t he TLA the learner cons t r uct i ng t he cor r espondi ng Mr kov chai n. Note that 

chooses a par arret er at r andorn fli ps it, and if t he Jie-w • + i n» j t « j- „ i i ™ • ™ • + 1, 

1 . iii- • 1 t;tii s is t ne (jrWpr ocedur e 1 or hndi ng 1 ocal riaxi ira, mtn 

par arret er set 1 1 ng makes asnal yzabl e , it r et ai ns 1 1ms jj- j • t u u- i • + ™ + 1, i „„ 

I . i r i- i .t lie. addi 1 1 on ol a pr obabi 1 1 1 y masur e on t lie 1 anguage 
val ue and moves to the corres pondi ng state; ot he r wi ,s e 1 1 

r emai ns i n i t s or i gi nal s t at e s . Let us exarri ne t hi s sit ua- 

ti on using the Mr kov chai n for mil at i on. The learner f s ( Assign distribution) First fix a probability ma- 
instates. It has n ne i ghbor i ng s t at e s e ac h at a Harmi ng s ur e P on t he s t r i ngs of t he t ar ge t 1 anguage L 
di stance of 1 f romi tself . The 1 earner pi cks one of tbe^lhumrate states) Assignastatetoeachl anguage 
uni f or ni y at r andom I magi ne t hatofi theseneigh- i . e . , e ac h. Z/ 

bor i ng s t at es correspond t o 1 anguage s whi ch cent ains /AT ,• u j u j j i \ t j j i i 

T . & . , ^ <• i /i-ir • ( JNor nal l z e by t he target 1 anguage .) Intersect all 

II t he 1 ear ner pi cks any one ol thssawES ( van chol , • j u j u j j i \ uj • -r 

^ i i • i • / \ • ii • 1 anguage s vath the target 1 anguage to obtain lor 

course 1 1 does va t h pr obabi Li/frM ,n l t woul dstayin i-j.ii j _rr ^ r m • , i , , 

;1 ; ; ; tp ;i i • i r ;i /i ; ; eachi, t he 1 ang ua gfe =LL ; fl L t ■ Ihus va t h s t at e 

t hat s t at e . 11 t he 1 ear ner pi cks any oltheotherstates -jj-jui j -jju 

... ,,-,• / \ / \ i • • • i associatedw tnl angua;gew£ nowassoci at e t he 

( va t h pr obabi 1 1 1 y ( ft+nn t hen 1 1 r emu ns l n s t at e s . i /r 

at i r /ii r. i-i i anguage L 

Not e t hat pol course couldbe whi c h m ans t hat none 

of the neighboring states woul d al 1 owt he stringtobe «n( Take set differences.) Nowfor any tTO states i 

alyzed. The maxi mimval ue/icoul d t ake i s n. Thus we andfc, if they are nwe than 1 Daranng distance 

see that the pr obabi 1 i ty that the 1 earner remai ns i n sta a B art ; t hen t he t r ansi t i on _P[ i — ►&] =0. If they 

s is P{^((n-nj)/n). The pr obabi 1 i ty t hat i t moves t o are 1 Ifemri ng di st ance apart then P[ i ->fc] = 

each of t he ot hery »t at es i s i^(j^ 1/n) . "\ ^k \ ^i) • 

CI earl y t hi s al 1 ows us t o compute the probabi 1 i ty t Hal s model captures the dynarri cs of the TLAcom 

t he 1 e ar ne r va 1 1 re mai n i n i t s or i gi nal state s as Iplug tSeUm. 

of the probabilities of the above t to cases, namely the, 
f i i • Ifcarrpl e. 

1 ol 1 ova ng express l on: r 

V^ P( «■) 4- V 1 ( 1 — n -In) Pi si Cbnsi der agai n t he 3- par arret er systemin the pre- 

^-^ : ^—* : vi ous figure with target language 5. W can cal cul at e 

1 ' t he f ol 1 owi ng s e t di fie r e nc e s to bui 1 d t he Mr kov figur e 

The above expr essi onis still alittle unt i dy becausSefffribgafet 1 or war dl y. 

the rj-'s init. W woul d 1 i ke t o cl ean i t up a 1 i 1 1 1 e. _T^ do^ ni 5 =0 (no s t ri ngs i n common bet ween .and 

t hi s c ons i de r t he way we vroul d c omput e t he t r ans ition target £). 

probability of state s to some other neighboring state. T m -, r n -, r s^ n -, r ™ ™ ^ < -,7 ^ 
;• j. t u- t? j-u u 1 • 2. Li n L 5 =\S V, SVO, S VOl 02, S Aux V, S 

say k in the chain, iromthe above analysis, we see . -.i. }. . ' ~ T ~ ' , ' ' 

,,-\ , , ... .,, • ,, 1 1 • 1 • + 1 /Aux VQ S Aux VOl CE }. 

that such a transition will occur with probability 1/n ' J 

for al 1 t he s t r i ngtshat are i n t he 1 anguage bit not 3. I3 OL 5 =0. 

i n t he 1 anguage s Z The s t r i ngs thenselves occur withzp L DL ^ =\S Y SVO S Aux V). 

pr obabi 1 i t y i?(j se ac h and so t he t r ans ition pr obabi lib . 

f s . J n V 3. kC)L 5 =L 5 . 

P\ s ^ k]= y (lln)P(p) 6. 4flL 5 ={SV, SVQSVaOB, SAixV, S 

aT , aT aT Aux VQ S Aux VOl 02} 

Sj £(L t C\L k ) \L S where \ is the set difference symbol . 8. h C\L 5 ={S V, S VQ S Aux V}. 

It is easytosee t hat Fr omt hese val ues al one , we can dr awt he figure illus- 

( t r,T \ \ t (Tr,T\\(Tr,T\ trated, and find t he 1 ocal maxi ma. For exampl e , si nee 

8j e(L t r\L k ) \L s o Sj e(L t nL k )\(j 1 nL,). the norimlized state set for state lis the enp ty S et, the 

Thus we can rewrite the transition probability as set difference bet ween s t at es 1 and 5 gives all of the tar- 
get 1 anguage ; so there is a (hi gh) t r ans i t i on probabi lit; 
pi s _j&l = \^ (1 / ' n) Pis) fromstate 1 to state 5. Similarly, since states 7 and 8 

s e (L nL )\( L nL ) share some t ar get 1 anguage s t r i ngs i n common, such as 

S V, and do not share ot her s , such as Aiv S and SVO, 
Si nee we have shown t hi s i n gener al i t y where for telmy 1 ear ner can move f r oms t at e 7 t o 8 and back agai n. 
gi ven t ar get , we can c omput e t he t r ansi t i on probabi 1 i May addi t i onal pr oper t i es of t he t r i gger i ng 1 ear ni ng 
bet ween any t to s t at es i n t he Mr kov chai n f or mil at iscyjs t emnowbecome evident once the mathematical for- 
of the parameter space, the sel f - t r ans i t i on pr obaM i z ityi on has been given. It is easy to imagine other 



al t er nat i ves to t he TLA t hat wi 1 1 avoi d t he 1 ocal wheat her any 1 ocal rraxi rra exi s t . Che coiil d al so 1 ook at 
i rra pr obi em For exarrpl e , as it s t ands t he 1 ear ner otihyr i ssues (1 i ke s t at i onar i tyor ergo di city assurrpt i ons 
changes a par arret er setting if t hat change al 1 owst Itaile rri ght pot ent i al 1 y affect convergence . Lat er we wi 1 1 
1 e ar ne r to anal yz e t he s e nt e nc e it c oul d not anal yE©iisei-de r s e ve r al var i ant s t o TLA and see ho w t he s e can 
fore. If we relaxthis c ondi t i on so t hat i n t hi safel ttwaf or rral 1 y anal yz e d wi t hi n t he Mir kov f or mil at i on. 
t i on t he 1 ear ner pi cks a par arret er at r andomt o charge wi 11 al so see t hat t hese var i ant s do not suffer from 
then the probl emwi t h 1 ocal rraxi rra di s appear s , becaiuhffi 1 ocal rraxi rra pr obi emassoci at ed wi t h GWs TLA 
t he r e c an be onl y 1 Abs or bi ng State, name 1 y t he tar geFe r haps t he s i gni fie ant advant age of t he Mir kov c hai n 
gr amrar . A 1 other s t at es have exi t ar cs . Thus , byfonrrul at i on is that i t al 1 ows us to al so anal yze conver- 
rrai n t he or enr sucha systemis 1 ear nabl e . gence t i ires . Q ven t he t r ansi t i on rrat r i x of a Mir kov 

Q- consi der for exarrpl e t he possi bi 1 i t y of noi se-ethhatn, the probl emof howl ong i t t akes t o converge has 
is, occasionally the learner gets strings that arbenimtwehl studied. This questionis of crucial importance 
t he t ar get 1 anguage . GWs t ate (fn. 4, p. 5) t hat itriri ear nabi 1 i t y. Fol 1 owi ng GW we bel i eve t hat i t i s not 
is not a probl enr the learner need only pay at t endmoingh to showthat the learning probl emis consi st ent 
to frequent data. But this is of course a ser i ousipeob- t hat the learner will converge to the target in t h< 
1 em for the model . Unl ess some ki nd of ire nor y olri nit. Walsoneedto show, as GWpoi nt out , that the 
frequency- count i ng devi ce is added, the learner claenaioii ng probl e mi s feasi bl e, i.e., t he 1 ear ner wi 1 1 conver 
know whet her t he exarrpl e it receives is noi se or inrot'r easonabl e" t i me . Ttti s i s part i cul ar 1 y t r ue i n t he case 
This being so, then there is always some finite probafinite parameter spaces where consistency might not 
bi 1 i ty, however smal 1 , of escapi ng a 1 ocal maxi miibe M much of a probl emas f easi bi 1 i t y. The Mir kov f or- 
appe ar s t hat t he i de nt i fie at i on i n t he 1 i mi t f r aire wcmJi at i on al 1 ows us to at t ac k t he f e as i bi 1 i t y que s t i on. It 
gi ven i s si rrpl y i ncompat i bl e wi t h t he not i on of nafis® ,al 1 ows us to cl ar i f y t he assurrpt i ons about t he be- 
unl ess a me nor y wi ndowof some ki nd i s added. havi or of dat a and 1 ear ner i nher ent in s uch an at t ack. 

W may now proceed to ask the fol 1 owi ng quest i oriW begi n by consi der i ng a few ways i n whi ch one coul d 
about t he TLA more pr eci sel y: for mil at e t he quest i on of convergence t i ires . 

1. Ebes it converge? 31 Son® Transi ti on Mtri ces and Thei r 

2. Howfast does it converge? Howdoes this vary with Convergence Curves 

distributional assumptions on the input exanp^$? us begin by following the procedure detailed in the 

3. Can we nowcompute the dynami cs for other "natuprevi ous secti on to actual 1 y obt ai n a f ewtransi ti on mu- 
ral " parameter systems , 1 i ke t he 10- par amet er tsrysc-es . Consi der the exarrpl e whi ch we 1 ooked at i nf or- 
temf or the acqui sitionof stress inl anguage s ntelvey-i n t he previ ous secti on. Here the target grammar 
oped by [ 4] ? was gr amrar 5 and t he L 1 anguage s have al ready been 

. ir • . rrrTA ij j i n ,, , obtained. For si rrpl i ci t y, let us first assume a uni form 

4. Variants ot 1LA wo ul d cor respond t o ot ner IVar kov. . . r . •" .. r . . ,.,. , 

, , n. ,1 qtj- i r;rdistri but i on on t lie s t r l nes nni, t lie pr obabi 1 1 1 y t Ire 

structures. JJbtlrey converge ! It so, nowt as t ir J . . °" ; .. r ' . . v . . _ . J 

learner sees aparticular sj mnrgg as 1/1/ bee aus e 

5. Row does the convergence time scale up with t|ig ere are 12 (degree- 0) strings iW £an nowcom 

number ol parameters. pute the transition matrix as the following, where 0's 

6. Wat is the computational complexity of 1 ear iPi<ttgupy mat r i x ent r i es if not otherwise specified: 
par amet r i zed 1 anguage f ami lies? 

7. Wat happens if we move from on- line to batch 
learning? Can we get PAG styl e bounds [6]? 

8. Wat does it mean t o have non- s t at i onar y ( noner- l 
godic) Mir kov s t r uct ur es ? Howdoes this relate to 2 
assumptions about parameter ordering and mat u- -^3 
r at i on? L4 

9. Wat other par arret ri zat i ons can we consider? 

^6 

I n t he re mai nde r of t hi s paper we s hal 1 c ons i de r t he s e £_ 
and other questions. W turn first to the question of ^ 
c onve r ge nc e and c onve rgence times. 

3 Gbnvergence Times for the Mrkov Notice that both 2 and 5 correspond to absorbing 

CVi • IXAHpI states; thus this chain suffers from the local rraxi rra 

problem Note also (following the previous figure as 
The Mirkov chain formulation gives us some distineil) that state 4 only exits to either itself or to stat 
advant ages intheoretically c har acterizing the 1 a2igiiBEgH; e is also a local rraxi mum M>re precisely, if T 
acqui si t i on probl em Fi r s t , we have al ready seeri fectwie t r ansi t i on pr obabi 1 i t y mat r i x of a cha^ri, t hen t 
gi ven a Mirkov Chai n one coul d i nves t i gat e whet herioie . t he el ement of T i n t he i t h row and j t h col urm i s 
not it has exactly one absor bi ng s t at e cor res ponditrige tpr obabi 1 i t y t hat the learner moves fromstate i to 
t he t ar get gr amrar . Thi s i s equi val ent t o t he quest isdrabf j i n one step. It is awell- known fact t hat i f one 
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consi der s t he cor r espondi ng i , j el errfrit krfnThi s i s not cl ear , pr esurrabl y t he i ssue of 1 ear nabi 1 i t y even i r 
is the probability that the learner neves f r oms ttaltes 3i- par arret er case deserves r e- exarri nat i on i n 1 i ght of 
t o s t at e j i n m s t e ps . For 1 e ar nabi 1 i t y to hoi d i rtrkisspfxDS s i bi 1 i t y. 

t i ve of whi ch s t at e t he 1 ear ner s t ar t s in, t he pr obatfiBiitcyus 1 y one can exarri ne ot her det ai 1 s of t hi s par- 
t hat t he 1 ear ner reaches s t at e 5 shoul d tendto 1 kEcnl ar system However , let us nowl ook at a case where 
goe s t o i nfini t y. Thi s rre ans t hat c ol urm IfGsfhfJril d t he r e i s no 1 o c al rraxi rra pr obi e m Thi s is t he c as e whe n 
containall l's, and t he rrat ri x shoul d cont ai n 0' s ehertyar get languages have verb- second ( V2) movement 
where else. Act ual 1 y we find t heft converges t o t he in GWs 3- par arret er case . Cbnsi der t he t r ansi t i on rra- 
f ol 1 owi ng rrat r i x as mgoes t o i nfini t y: t r i x obt ai ned when t he t ar get 1 anguage iAgal n we 

assume a uni f or mdi s t r i but i on on s t r i ngs of t he t ar get . 
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Pi(rn) 
P 2 (rn) 
Pz(rn) 
Pi(rn) 
Ph(rn) 
Pe,(rn) 
P 7 (rn) 
Ps(rn) 



Exarri ni ng t hi s rrat r i x we see t hat if t he 1 e ar ne r starts 
out instates 2 or 4, it will certainlyendupin st at eH&ierwe find t hat T does i ndeed converge to a rratri x 
the 1 i rri t . These two states correspond to 1 ocal rrawi bh 1' s i n t he first col urm and 0' s el se where. Cbnsi der 
gr arrrrars i n t he GWf r amework. If the 1 earner st art stiiB first col urm of"T It is of the form 
either of these two states, it will never reachthe target. 
Fr omt he rrat rixwe also see t hat if t he 1 e ar ne r starts in 
s t at es 5 t hr ough 8, it will certainly converge i n t he 1 i rri t 
t o t he target grammar. 

The s i t uat i on r e gar di ng s t at e s 1 and 3 i s rror e i nt e r - 
esting. If the learner starts in either of these states, it 
wi 1 1 r each t he t ar get gr amrar wi t h pr obabi 1 i t y 2/3 and 
r each s t at e 2, t he ot her absor bi ng s t at e wi t h pr obabi 1 i t y 
1/3. Thus we see that local rraxi rra are not the only 
problemfor 1 ear nabi 1 i t y. GW(p. 26 in ramus c r i ptj^ r e R denotes the probability of being in state 1 
focuses exclusively on local rraxi rra, and indirectly im, elK j of mexarrples in the case where the learner 
plies that these are the onl y di fffcul t states: "mgstlarqfed i n s t at e i . Nat ur al 1 y we want 
the source grammars have local triggers that enable the 
learner to get t o t he target. . . however, there exist pairs lirmp^rn) =1 

° ° L m+oo 

of source and target grammars f r omt he parameter space 

given in the table in Figure 3, such that no data M^rn 01- t hi s exarrpl e this is indeed the case. The next 

the target gr amrar will ever shift the learner out §gifflle s hows a pi ot of the following quantity as a function 

sour ce gr amrar . . . There are si x such pai r s of sour@e Pe-f ne number of exarrpl es . 

cal rraxi mum and target grammars" They then go on 

to list intheir figur e4, < «» such 1 ocal rraxi rra for t he 

t ar ge t gr amrar 5 , c or r e s pondi ng to states 2 and 4 . The quant i ty p(rri) is easytointerpret. Thus p(rn) = 

Wi lethis statement is strictly true, it does mbt9 Seme ans t hat for e ve r y i ni t i al s t at e of t he 1 e ar ne r t he 
haust t he set of sour ce states t hat never 1 ead t o t he pao^abi 1 i t y t hat it is i n t he t ar get s t at e af t er raexam 
gr amrar . As we see fr omt he t r ansi t i on rrat r i x, whpl es is at least 0. 95. Fur t her t her e is one i ni t i al s t at e ( t 
it is t r ue t hat s t at es 2 and 4 wi 1 1 , wi t h pr obabi wbt ytli,ni tial statewithrespect tothetarget, whi chin our 
not converge to t he t ar get gr amrar , it is al so t r ueetxbaaipl e i s ^) f or whi ch t hi s pr obabi 1 i t y i s exact 1 y 0. 95. 
states 1 and 3 wi 1 1 not conver ge t o t he t ar get . Thus^tfimd on 1 ooki ng at the curve that the learner con- 
number of "bad" i ni t i al hypot heses is si gni fie ant 1 y Yiargea - wi t h hi gh pr obabi litywithin 100 to 200 (degree-0) 
t han t hat pr esent ed i n Fi gur e 4 of GW Thi s di ffer encexaarpl e sentences, a psychol ogi cal 1 y pi ausi bl e number . 
agai n due t o t he newpr obabi 1 i s t i c f r amework i nt r od(iGnel can now of cour s e proceed to exarri ne act ual t r an- 
i n t he c ur r e nt paper, and in fact is related to ths cdiifB-t s of c hi 1 d i nput t o c al c ul at e c onve rgence times for 
cul t y found ear 1 i er wi t h t he cent r al convergence p'aobfial " di s t r i but i ons of exarrpl es , and we are cur rent 1 y 
1 ooki ng j us t at rri ni mal pat hs and cycl es i n f act rreagaged i n t hi s effor t . ) 

some possi bl e 1 ear ni ng pat hs . I n t he appendi x of t hi s Am-one example of the power of this approach, we 
per, we pr o vi de a c ompl e t e 1 i s t of al 1 s t ar t i ng s t at essanhd siiipar e t he c onve rgence t i rre of TLA to ot he r al - 
rri ght res ul t i n non- 1 ear nabi 1 i t y. Wi 1 e t he i mpl i c a^droiit <hfr$ . Perhaps the simplest is random walk: start 
t he exi s t ence of addi t i onal non- 1 ear nabi e s t ar t i ngfet kfeasner at a r andompoi nt i n t he 3- par arret er space , 



p(rn) =ni n{p 8 ( rri) } 



and then, if an i nput sentence cannot be analyzed, rsevetion. This rratrix has non-zero elements (transition 
r andorri y f r oms t at e t o s t at e . Not e t hat t hi s r egi mepaaafeabi lities) exactly where t he ear 1 i er rrat r i x had non- 
not suffer from the local rraxi rra problem, since theeEO elements. However, the value of each transition 
i s al ways some fini t e probabi 1 i t y of exi t i ng a non- tpatDgxatbi 1 i t y now depends upon a, 6 , c , and d. In parti cu- 
state. lar if we choose a =1/12, 6 =2/12, c =3/12, d =1/12 

To s at i sf y t he reader ' s cur i osi t y, we pr ovi de t h^thim- i s equi val ent to assurri ng a uni f or mdi s t r i but i on) 
verge nee curves for a r andomwal k al gor i t hm( RV%) ok obt ai n t he appr opr i at e t r ansi t i on mat r i x as before . 
the 8 state space. W find that the convergence tirfaesoking more closely at the general transition rratrix, 
are act ual 1 y f as t e r t han for t he TLA; see figure 2. $e nsee t hat t he t r ans i t i on pr obabi lity fromstate 2 to 
t he 15% i s al so super i or i n that it does not suffer fsrtoajhe lis (1 — ( a +6 +c) ) /3. Clearlyifwe make a arbi - 
t he same 1 ocal rraxi ma probl emas TLA, the concept uaflr ari lyclosetol, thenthis t r ansi t i on probabi 1 i t y i s ar 
support for the TLA i s by no means clear. Gf c our stir ,ari 1 y close to so that the nunber of samples needed 
i t may be that the TLA has errpi ri cal support , i n tthoe converge can be rrade ar bi t r ar i 1 y 1 ar ge . Thus choos- 
sense of i nde pendent evi dence that chi 1 dren do usei hhilsarge val ues f or a and srral 1 val ues for b wi 1 1 resul t i i 
procedure (gi ven by t he pat tern of t he ir errors, et d.a)r,gfeutonver gence t i mes . 
this evidence is lacking, as far as we know. Thi s means that the sample complexity cannot be 

Nowthat we have rrade a first attempt to quant i f y thbounded in a di stri buti on- free sense, because by choos- 

convergence time, several other questions can be ranged, hi ghl y unfavorable distribution the sample com 

How does convergence time depend upon t he di s t r i bplexi t y can be rrade as high as possible. For exam 

t i on of t he dat a? How doe s it c orrpar e wi t h ot he r ki pfe , we now gi ve t he c onve r ge nc e c ur ve s c al c ul at e d f or 

of Mrkov structures with the same nurrber of s t at eMffer ent choices of a, b, c, d. W see that for a uni - 

Howwill the convergence time be affected if the nrfrer mdi st ri but i on t he convergence occur s wi t hi n 200 s am 

ber of states i ncr eases , i . e the nurrber of par arret epd sea ri- By choosi ng a di stri buti on with a =0. 9999 and 

creases? How does i t depend upon the way i n whi ch = c = d = 0. 000001, the convergence ti me can be 

the parameters relate to the surface stri ngs? Ar e rAMsesi up t o as much as 50 nil 1 i on s arrpl es. ( Cf course, 

other ways to characteri ze convergence ti mes? W rtelws di stri buti on i s presurrabl y not psychol ogi cal 1 y real 

proceed to answer some of these quest i ons . i sti c. ) For a =0.99, b = c =rf=0.0001, the s arrpl e 

„ _, _ . • , • , , • corrpl exi t y i s on t he or der of 100, 000 posi t i ve exarrpl es . 

3.2 Distributional Assumptions 

Inthe earlier sectionwe assumed t hat t he dat a was 3m3 - Asorpti on 11 ires 

f or ni y di s t r i but ed. W corrput ed t he t r ansi t i on rratr i x. . ; . . ; . . ; . 

, • i , , i ii l j. l j. Intne previous sections, wc orrput e d t Ire t r ans 1 1 1 on rra- 

1 or a part l cul ar t ar get 1 anguage and snowed t nat conver- . L . J <• i- , • i , • ii i , i , r 

, • r , i , r 1 nn nnn i T trixlora van ety or di s t r 1 but 1 ons and snowed t Ire r at e or 

gence 1 1 mes were or t Ire or der or 100-200 s arrpl es.lntnis T J J . . i;;i/\/;i ^ 

, • i j. i j. j. i j. • j jC onve rgence. In particular we plotted p ra, t he pr o b- 

sect l on we snowt nat t Ire convergence 1 1 mes depend cr air. . ; ° . ^ . , ; . ^ ; r 7 , , • • , • i 

• 1, ,, ,. , • , , • T j. • i abi 1 1 1 y or conver gi ng 1 r omt Ire most uni avor abl e l ni 1 1 al 

ci al 1 y upon t Ire di s t r l but l on. In part l cul ar we can zaopser •;/?,, r i \ TT ; i • 

i- . • i . • j-i -ii i ii .-state agai ns t ml t Ire nurrber or s arrpl es . However , t hi s 

a di s t r l but l on whi c h wi 1 1 rrake t Ire c onve reence tint as / ,° , v ; , ; . ' ' . 

, im j. i j • j. • i. j. • c is not the only way toe nar acterize c onve rgence times. 

1 ar ge as we want . thus t Ire di s t r l but l on- tree convergence . . ; . J . ■; ; ; . ; . ; . ; ° 1;1 . 
,-p , i o j. j. • ■ c ■ i. aven an initial state, the tint t aken to reach t he ab- 

time tor the 3-par ant t e r s ys t e mi s l nhm t e . ; . ; ; , . ' ; . . ; . , • \ • 

A , o -i , i • , , • i , i , sorption state known as t he absorption tint is a r an- 

As bet or e , we consi der t he sit uat l on where t he t a>get • , , „ v . , , ' ■ 

, j- m ii- ii domvar i abl e . uie can conpute the man and variance 

1 anguage l s i_L there are no 1 ocal rraxi rra probl errs . ; . . . • , , ^ ; i ^ , ^ 

o . i • i • tit i • i i , , • , i i- , • i ,ol t hi s r andomvar l abl e . lor t he case when t he t ar get 

t or t hi s choi ce . W begi n by 1 et 1 1 ng t he di s t r l but.i on 15e . . ; . ; ; . . ; . ; .° 

, • , , , , -ii 7 7 j 1 anguage l s i L we have s e e n t hat t he t r ans 1 1 1 on rrat r l x 

par arret r l zed by t he van abl es a, b , c, d wher e . ° . ° . ^ 

has the form 

a = F(A={Adv VS}) / 1 Q 

b = P(B={Adv V OS, AivAuxVS}) T= ( B O 

c = P( [C = {Adv VOl C2 S, AivAuxV OS, V V 

Aiv Aux VOl 02 S}) Here Q i s a 7- di mensi onal square rratrix. The mean 

d = P(D ={VS}) absor pt i on t i mes f r oms t at es 2 t hr ough 8 i s gi ven by t he 

Thus each of the sets A, B, C and ^contain di ffer el ct or ( see I s aacson and Mdsen [3]) 
degree- sent e nee s qi d ear 1 y t he probabi 1 i t y of t he _ 1 

set k \ {AUBUCUE} is 1 -(a +b +c +d) . The »={I-Q) 1 

elements of each defined subset pferf equal 1 y 1 1 kel y^ llg a 7 _ dl mnsl ona i colunn vector of ones. The 

with respect to each other. Set 1 1 ng pos 1 1 1 ve valu^.^^ seco]ldromDts ls glvenby 
a, b , c, a such t hat a +b +c +d < 1 no w de hne s a uni que 

probabi 1 i t y f or each degree ( 0) sent ep.ceFim efxam n' ={ I —Q) ~ l { 2n —1) 

pi e , t he probabi 1 i t y of Adv VOS is 6/2, t he probabi 1 i t y of 

Adv AuxVOS i s c/3, that of VOS i s ( 1 — ( a+b +c +d) ) /ftjsi ng this result, we can nowcompute the mean and 
and so on. s t andar d devi at i on of t he absor pt i on t i me f r omt he most 

W c an no w obt ai n t he t r ans i t i on rrat rix corres pomuM avor abl e i ni t i al state of t he learner. ( W not e t hat 
i ng t o t hi s di s t r i but i on. Thi s i s shown i n Tabl e l.t he second moment is fairly skewed in such cases and so 

Compare t hi s rrat rix wi t h t hat obt ai ned wi t h a uriis- not syrrmet r i c about t he mean, as rray be seen from 
for mdi stri but i on on t he sent enceg iorf tihe ear 1 i er t he pr evi ous curves . ) 
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3.4 Eigenvalue Rates of Convergence 

In classical Mrkov chain theory, there are also e 



be represent ed as as ubset *of .S . 

Li ={ui n, U42, ■ ■ ■ }f, ©X* 

The 1 e ar ne r is pr ovi dedwith positive data (strings t hat 
belong to the language) drawn according to distribu- 
tion P on the strings of a particular target language. 
The 1 e ar ne r is t o i de nt i f y t he t ar ge t . It is qui t e pos s i bl 
that the learner receives strings that are in more than 
,ongi | anguage . In such a case the learner will not be 
known convergence theorem derived froma consi de^ to uni quel y i dent i fy t he target. However, as more 
ationof the eigenvalues of the t r ansi 1 1 on imt r i £ nd W> r e dat a becomes aval 1 abl e , t he pr obabi 1 1 1 y of hav- 
state without proof a convergence result for t r anW« f<5rT el ved onl ^ an ^ &\ on * s t r J ngs becomes s iml 1 er | and 

_;• + + j • + ™, .f • + •„ i s nal 1 e r and e ve nt ual 1 y t he 1 e ar ne r wi 1 1 be abl e t o l de nt l f y 

rrat rices stated m terns ot its eigenvalues. J . . . J 

t he t ar ge t uni que ly. Aninteresting que s 1 1 on t o as k t he n 
Ineoremo Let 1 be an n xn t ransi 1 i on rmtri x wit n j s n0 wrrany s anpl es does the learner needtosee sot hat 
n linearly independent left eigenvectors x. 2 xor- wi t h hi gh confidence i t i s abl e t o i dent i fy t he t ar get , i.e 
responding to ei genval ues 1; \. . . n .\ Let x (an n- t he pr obabi 1 i t y t hat after seeing that rrany s anpl e s , the 
dimnsional vector) represent the starting probabi I itf e qjj. nev is stin an fci gi ous about the target is less than 6 
being in each state of the chain and tt be the / « m '^>ae f ol 1 owi ng t heor empr ovi des a lower bound. 
probability of being in each state. Lhen after k transi- 
tions, the probability of being in each SffLAecsai be 



descri bed by 



x T fc ^r 



INI £A?xoy,-x,- 
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2<i<n 
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Theorem4 The learner needs to draw at least M = 
nax j/ -jj— tjt— r 1 n( 1/6 ) sarrples (wherf pP(L t P\L j)) 

in order to be able to identify the target with confidence 
great er than 1 —6 . 



where them's are the right eigenvectors of T . 
This theoremthus bounds the rate of convert 



Proof. Suppose the learner draws m(less than 

M) s anpl e s . Le t k = ar g rraX' -jjj . Tin s me ans 1 ) 
M= YnTU — tIu(V^) and 2) that with probability p 
^ ence tne learner receives a string which is i n, bntih L 
the limtmg distribution tt (incases where there ^. on fe nce lt will be unable to discrimnate between 
one absorption state, tt wi 1 1 have a 1 cor r es pondi ng e t p ar get t he t he H h j anguage Af t er dr awi ng ms am 
that state and everywhere else). liing this resujjt^ t he pr oba bi 1 i t y t hat all of thembelong to the set 

v ln £fr\Lk is (#>) m . In such a case even after seeing m 



can now 



bound the rates of 



convergence 



nurrber k of sanples) by: 



Lear ni ng s cenar i 


cRat e of Convergence 


TLA ( uni f or rn) 


O(0. 94) 


TLA(a =0. 99) 


O((l-10- 4 ) k ) 


TLA(a =0. 9999) 


0((1 -10" 6 ) & ) 


tw 


O(0. 89) 



s anpl es , t he 1 ear ner wi 1 1 be i n an arrbi guous s t at e . Now 
(Pk) m > (Pk) M s i nee m < M and pj, < 1. Fi nal 1 y 
since Afln(l / 4^> =ln((l/g)) M ) =ln(l/6), we see that 
(Pk) m > 8 . Thus t he pr obabi 1 i t y of bei ng arrbi guous af - 
ter mexarrples is greater t han 6 which mans that the 
c onfide nee of being able to identifythe target is less t han 
1 -6. I 

Tai s theoremal so hel ps us to see the connecti on belli s si npl e resul t al 1 ows us to assess the nurrber of 
tween the nurrber of exarrpl es and the nurrber of pa-sanpl es we need to drawi n order to be confident of cor- 
ramters since a chain with n states ( cor res pondi n|c#©y i dent i fyi ng t he target. Not e t hat i f t he di s t r i but i ■ 
an n xn transition rrat rix) represents a language faMltjhe dat a i s very unfavorable, that is, the probability 
withlog(n) paramters. of receiving arrbi guous strings is quite high, then the 

nurrber of sarrples needed can actually be quite large. 
4 B&tch Learning Upper and Lowr Wile the previous theorempr ovi des the nunber of s am 

p» j . » a • j pl es necessary t o i dent i f y t he t ar get , t he f ol 1 owi ng t heo- 

remprovides an upper bound for the nurrber of sarrples 
So far we have di scussed a mrroryl ess learner novitrhflt are sufficient t o guar ant ee i dent i ficat i on wi t h hi gh 
f r oms t ate tost at e in par arret er space and hopeful 1 yCiOgrfjide nc e . 

verging to the correct target infinite tim. A wejjy^^ If the j e(lrner draw mre than M = 
this was well-modeled by our Mr kov for mil at i on. In i l n (l/^) sanples, then it will i dentify the tar- 

this section however we step back and consider upper /l ^ .,''„, , -, c / tt i 

ji i in i • c-ii r ■ t bet. mt h confidence qreater than 1 —6 . ( Bert. & 

and 1 ower bounds t or 1 earni ng nni te 1 anguage t am 1 M n r \ \ 

t he 1 ear ner was al 1 owed t o r e member al 1 t he s t r i ngs en- \ i / =^i ) ) ' 

count er ed and opt i ni ze over them Nee dl ess tosay iLkbef. Cbnsi der the set L =f \\Jj /Jjj. Any el e- 
ni ght not be a psychol ogi cal 1 y pl ausi bl e assunpt i ormehutof t hi s set is pr esent i n t he t ar get 1 angiuige L 
it can shed 1 i ght on t he i nf or rrat i on- t he or et i c corrpiietditnyany ot her 1 anguage . Consequent 1 y upon r ecei vi ng 
of t he 1 e ar ni ng pr obi em suchastring,thelearnerwillbe abl e t o i ns t ant 1 y i de n- 

Cbnsider a situation where there are n 1 anguatgiefsy the target. After m>Msarrples, the probability 

L\, h, ■ ■ «iover an alphabet XL Each language cart hat the learner has not r ecei ved any rre nber of this set 

8 



is ( 1 — P( Lyj = ( 1 — b t) m < ( 1 — b t) M = S . Hence state if the newsentence is anal yzabl e. Ot her wi se the 
t he pr obabi 1 i t y of s eei ng some member of Lint hosd erar ner moves uni f or rri y at r andomt o any of t he ot her 
s arrpl es is greater t han 1 —6 . But seeingsuchame rrbsetrat e s and s t ays t he r e i ff t he s e nt e nc e c an be anal yz e d. 
e nabl es the learner to identifythe target so t hel {jrtolfce- s e nt e nc e c annot be anal yz e d i n t he ne w s t at e t he 
ability t hat t he 1 e ar ne r is abl e to i de nt i f y t he tlaerauerte r sr e rrai ns i n i t s or i gi nal state. 

greater than 1 -S i f i t dr aws nor e than Ms arrpl es. I Fig. 4 shows the convergence times for these three al- 
io s urrrrar ize, this section pr ovi de s a s i rrpl e upgsffi t hns wire n £ i s the target 1 anguage . I nt e r e s t i ngl y, 
andl ower bound on the s arrpl e corrpl exi ty of exact i deji-l three perf or mbet t er than the ILAf or t hi s task. Fur- 
t i fie at i on of the target 1 anguage f romposi ti ve dat a>heTh<t hey do not suffer f roml ocal rraxi rra probl ens. It 
6 parameter that measures the confidence of t he 1 eaisrhaul d be poi ntedout , however , that the di fferences from 
of being able to identify the target is suggest i i/ILAfarae rrargi nal and t hi s convergence has beenshown 
PAC [ 6] formulation. However t her e i s a cruci al diclfery for ]Las t he t ar get 1 anguage . I deal 1 y t he conver- 
ence. I n t he PAC for mil at i on, one is interestedi n gunne-e rates have to be computed for each target 1 anguage 
approxi rrat i on to the target 1 anguage wi t h at 1 east arrd^then ei ther a worst case or average case rate shoul d 
confidence. In our case, this is not so. Si nee we arftendttci ded upon t o char act eri ze t he convergence t i mes 
al 1 owe d t o appr oxi mat e t he t ar ge t , t he s arrpl e c onplf <ex- 1 he al gor i t hmon t he 1 anguage f ami 1 y as a whol e . 
ity shoots up wi t h choice of unfavorable distributions. 

There are some interesting directions one could f<s>l 1 c/w i , • /-* ^ /-* , • „ j 

........ ■ r i ^ iiP vx»ncl usi on, Upen CJuesti ons, and 

within this batch learning iramework. Qie could try . 

to get true PAG style di s t r i but i on- f r ee bounds for vari^ture JJ recti ons 

ous ki nds of 1 anguage f ami lies. At er nat i vel y one xou.1 d . . 

, , x • i x • £ x • i x i j- i • As j. t he ,n,urrbe r ot parameters nine re as es, the size ot the 

use t he exact l dent l heat l on res ul t s here t or 1 1 ngui s 1 1 call y .. , T , , • „ ™v • , , 

, • ii , r • i • • . i a ui » GOBr-aspoTiai ng tvarkov matrix grows as Jims l n t he 

pi ausi bl e 1 anguage t am 1 1 es with reasonable proDabil-T .„ r , • ^ 

• ; i-;-i;- iUji Tj-'Uiu • ± case or a 1 par ame t er sys t emas t ound 1 n model s ot En- 

1 1 y di s t r l but 1 ons on t he dat a. It m ght be an 1 nt er es/t r ng l , . . . J 

, , , i i i r i alii s h s t r e s s ( 4 ) t he c or r e s pondi ng JVar kov s t r uc t ur e wi 1 1 

exercise to r ecorrput e t he bounds t or cases where ft he . ._ . . „ V „ L . J ' J . T , 7 , , , , • 

, • uii • j. • j j. • j j. ibe a, JU/4 xl(J/4 rrat r l x. W are cur rent 1 y conduct l ng 

learner receives bothpositive and negat l ve dat a. ii nal ry. r ,-, n i • i ^ ■ 

.ii i i . • j, iju i jj-i ian anal ysi s ot t hi s 1 ar ger sys t emt o hnd its 1 ocal rraxi ma, 

t he bounds obt ai ned here coul d be sharpened t ur t her .. J . ° . J ... ' 

h, • , , , , , • , p., . • • anal yze its convergence 1 1 mes , and see it its convergence 

W l nt end to 1 ook l nt o some ot t hese ques 1 1 ons l n /the J . ; ° . ; . ' ; „ . . ; . . °. 

r . 1 1 mes correspond t o what one m ght hnd inpractice with 

t ut ur e . l t> t- 

real stress sys tens. 

5 Vari ants of the Learni ng Mdel Mdi ti onal questions reminto be answered. Che i s- 

sue has to do with the smoothness relation between 
W have so far focused on the TLA scheme for 1 earrt-he parameter setti ngs and the resul ti ng surface stri ngs . 
ing. TLA observes the si ngl e val ue and greedi ness fcaipri nci pi es- and- parameters theory, it has often been 
strai nts . There coul d be several vari ants of t hi s lseiaggfiajged that a srral 1 parameter change coul d 1 ead to 
al gori thmand many of these are c apt ured corrpl et <al yarge deduct i ve change i n t he gr armar , hence a large 
by our Mr kov for mil at i on. W consider t he fol 1 owibgnge i n t he surf ace 1 anguage gener at ed. I n al 1 the ex- 
three si rrpl e vari ants by droppi ng ei ther or both ofintples consi dered so far there i s a smooth rel at i on be- 
Si ngl e \al ue and Geedi ness constrai nts: t ween surf ace sentences and parameters , i n that swi t ch- 

i ng from a V2 t o a non- V2 sys tern for instance, leads 
RmdomwalkWth neither greediness nor single us to a Mrkov state that is not too far awayfromthe 

value constraints: W have al ready seen t hi s exam previous one If t hi s is not SO; it is not so clear t hal 
pie before. The learner is in a particular state. th |)qj Awill TOrk ag kfore> In fact, t he wfiol e ques- 
recervrng a newsentence, rt rerrarns rn that s t at e ^^1^ howto formulate the notion of "smoothness" in 
sentence is anal yzabl e. If not, the learner moveg PS n g ua g e -gr amrar framework is unclear. W know 
forrriy at randomto any of the other states andst H s tlle case of cont inuous functions, for example, that 
there wartrngfor the next sentence. Thr s r s done wi^c^ i earner [ s allowed to choose examples (which can 
regard to whether the newstate allows the s e nt e nc,e e t §i m i at e d by se l ect ive attention), then such an "ac- 

ill ^ J / ' 

oe anal yze a. t i ve" 1 ear ner can approxi rrat e such f unct i ons rruch rror e 

RindomwilkWth no greediness but Wth single quickly than a "passive" learner, like the one presented 

ml ue constraint: The learner renains in its or i giMapW I s there an anal og t o t hi s i n t he discrete, digital 
state if the newsentence is analyzable. a her wi se 1 ? 11 ?^ of language? Howcanone appropriate a 1 an- 
learner chooses one of the parameters uni f or ni y atSft?^ 7 ftre to ° nathenatrcs nay pi ay a hel pf ul role, 
demand flips it the re by rroving to an adjacent stated 11 t hat there rs an anal og t o a f unct r onal analysrs 
the Mrkov structure. Agar n thr s rs done without r e gU#nguages^ramel y, the al gebraic approach advanced 
to whether the newstate allows the sentence to be feaPioraky and Schut zenber ger ([5]). I n t hi s model, a 
lyzed. However since only one parameter is changed %ftg ua 8 e ls des cr r bed by an ( r nfinr t e) pol ynom al gener- 
atrm, the learner can onl y neve t o ner ghbor r ng s t*&H& f unct ' on > ^ e \ e the coeffti ents on the pol ynom al 
at anv ei ven t i ne terms; gr ves the nurrber of ways of derr vr ng t he st rr ng 

x. A (weak, string) appr oxi rrat i on t o a 1 anguage can 
Rmdomwalk w th no si ngl e value constrai nt but then be defined in terns of an approximation to the 

w th greedi ness: The learner remains in its or i eigiaier at i ng function. If this method can be deployed, 



t hen one ni ght be abl e to carry over t he r esul t s of f unc-gr amrar i s ( VC6- V2) . For cases when t he t ar- 
t i onal anal ysi s and appr oxi rrat i on for act i ve vs . pas si get is 1 ear nabl e , t he 1 ear ner converges to t he t ar get 
1 ear ner s i nt o t he "di gi t al " dorrai n of 1 anguage . If thus 100-200 s arrpl es wi t h hi gh ( great er t han 0.99) 
is possible, we woul d t hen have a very powerful set of pr obabi 1 i t y. Further, the variants of the TLA al 1 
pr evi ousl y under ut i 1 i zed rrat he rrat i cal t ool s t o anal yzaut perf or mt he TLAi n t er its of convergence t i mes . 
1 anguage 1 e ar nabi 1 i t y. 
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abi 1 1 1 y oi not conver gi ng t o t lie t ar get 1 or each ol t lies e ' ' 

pai r s . Not e t hat t hi s pr obabi 1 i t y i s no n- zero for t he pai r s 

listed. 

A 2 Ifemrks 

1 . W have pr o vi de d a c ompl e t e list of initial start- 
i ng gr amrar s f r omwhi ch some t ar get is not 1 ear n- 
abl e (i.e. 1 ear nabl e wi t h pr obabi 1 i t y 1) . W no- 
tice that there are three kinds of such problem 
starting states. Some states correspond to sinks 
i n t he Mir kov Structure with respect to some tar- 
get grammar. Here the learner gets stuck, never 
leaves it and correspondingly never converges to 
the target. Then there are states which are not 
sinks (OVS+V2 when the target is SVO-V2) but 

whi ch can onl y move t o some non- t ar get s i nk, and 
so never converge to the target. These two kinds 
of pr obi ems t ates (starredin our t abl e) have been 
listed by Q bson and Wxl er i n Fi g. 4 (pg. 27 of 
nanus c r i pt ) . Fi nal lythere are states whi chare not 
sinks, but which can with a non zero probability 
converge to some non-target sink. They can also 
wi t h a non- zero pr obabi 1 i t y converge to t he t ar get 
and i n t hi s respect are di st i ngui shed frompr obi em 
s t at es of t ype 2. 

2. W would like to observe that of the 56 possible 
initial gr amrar- t ar get grammar combinations pos- 
sible, 1 2 r e s ul t i n non- 1 e ar nabl e situations inthe 3- 
par ame ter systemi nve sti gat ed he re. Thi s is a f ai r 1 y 
hi gh de ns i t y of unf a vour abl e i ni t i al c onfigur at i ons . 

1 1 woul dbe interestingtosee howt hi s changes wi t h 
other lingual subsystems with a larger number of 
par amet er s . 

3 . W al s o di d an anal ys i s of c onve rgence times unde r 
uni for nidi st ri but i on for the each target grammar. 
W find t hat the results are similar to the results 
di spl ayed i n t he paper for t he case when t he t ar get 




Fi gur e 1: The 8 par arret er set t i ngs i n t he GWexarrpl e , shown as a Mir kov s t r uct ur e , wi t h t r ansi t i on pr obabi 1 i 
orri 1 1 ed. ( Wt hout t r ansi t i on pr obabi 1 i t i es , t hi s di agr amcor responds exact 1 y t o t hat i n GWs appendi x, as m 
above .) Drectedar rows betweencircles (states) represent possible nonzer o ( possi bl e 1 ear ner ) t r ansi t i ons 
gr arrrrar ( i n t hi s c as e , nunbe r 5, setting [0 1 0]), lies at deadcenter. Ar ound it are the three setti ngs 1 1 
f r omt he t ar get by exact 1 y one bi nary di gi t ; sur r oundi ng t hose are t he 3 hypot heses t wo bi nary di gi t s away f r 
t ar get ; t he t hi r d r i ng out cont ai ns t he si ngl e hypot he si s t hat di ffer s f r omt he t ar get by 3 bi nary di gi t s . 
t he 1 ear ner can ei t her cycl e or s t ep i n or out one r i ng (bi nar y di gi t ) at a tins, accor di ng t o t he si ngl e- s t t 
hypot he s i s ; but some t r ans itions are not possible bee aus e t he r e is no dat a t o dr i ve t he 1 e ar ne r f r omone s t a 
other under the TLA 
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